Tech Arsenal 1

home *** CD-ROM | disk | FTP | other *** search

/ Tech Arsenal 1 / Tech Arsenal (Arsenal Computer).ISO / tek-05 / pcprl11.zip / MAIN.DOC < prev next >

Wrap

Text File | 1992-02-06 | 111KB | 3,083 lines

A Guide to Parallel Programming Using PCS-Linda and the Parallel Lan System Parallel Computer Solutions November 30, 1991 Contents Disclaimer:. . . . . . . . . . . . . . . . . . . . . . . . . . 4 Linda. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 The Language Linda. . . . . . . . . . . . . . . . . . . . 5 Linda Primitives. . . . . . . . . . . . . . . . . . . . . 6 Tuples. . . . . . . . . . . . . . . . . . . . . . . . . . 6 Elements. . . . . . . . . . . . . . . . . . . . . . . . . 6 Actuals . . . . . . . . . . . . . . . . . . . . . . . . . 6 Formals . . . . . . . . . . . . . . . . . . . . . . . . . 7 Tuple Space . . . . . . . . . . . . . . . . . . . . . . . 7 Tuple Matching. . . . . . . . . . . . . . . . . . . . . . 8 Further Examples. . . . . . . . . . . . . . . . . . . . . 9 Note on real elements . . . . . . . . . . . . . . . . . . 9 Linda Instructions . . . . . . . . . . . . . . . . . . . . . 10 IN. . . . . . . . . . . . . . . . . . . . . . . . . . . 10 RD. . . . . . . . . . . . . . . . . . . . . . . . . . . 10 OUT . . . . . . . . . . . . . . . . . . . . . . . . . . 11 EVAL. . . . . . . . . . . . . . . . . . . . . . . . . . 11 INP . . . . . . . . . . . . . . . . . . . . . . . . . . 12 RDP . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Hello World Example . . . . . . . . . . . . . . . . . . 12 Parallel Lan System. . . . . . . . . . . . . . . . . . . . . 14 Requirements. . . . . . . . . . . . . . . . . . . . . . 14 Packet Drivers. . . . . . . . . . . . . . . . . . . . . 15 Master . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Multiple Processor Communication. . . . . . . . . . . . 17 Different Masters . . . . . . . . . . . . . . . . . . . 18 Machine . . . . . . . . . . . . . . . . . . . . . . . . 19 DOS . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Extended Memory . . . . . . . . . . . . . . . . . . . . 19 Master Installation . . . . . . . . . . . . . . . . . . 20 Worker . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Worker Installation . . . . . . . . . . . . . . . . . . 22 Developer. . . . . . . . . . . . . . . . . . . . . . . . . . 24 Compiler Directives . . . . . . . . . . . . . . . . . . 26 Units and System Code . . . . . . . . . . . . . . . . . 27 Startup Procedure . . . . . . . . . . . . . . . . . . . 27 Procedure Declarations. . . . . . . . . . . . . . . . . 30 Global Variables. . . . . . . . . . . . . . . . . . . . 30 Readln/Writeln. . . . . . . . . . . . . . . . . . . . . 30 Formal Parameters . . . . . . . . . . . . . . . . . . . 31 Nesting Procedures. . . . . . . . . . . . . . . . . . . 31 Multiple Procedures . . . . . . . . . . . . . . . . . . 31 CAUTION!. . . . . . . . . . . . . . . . . . . . . . . . 31 Developer Program Design. . . . . . . . . . . . . . . . 32 Tuning. . . . . . . . . . . . . . . . . . . . . . . . . 32 Debugging . . . . . . . . . . . . . . . . . . . . . . . 33 Main . . . . . . . . . . . . . . . . . . . . . . . . . 33 Skeleton Code . . . . . . . . . . . . . . . . . . . . . 34 What the Developer Doesn't Show . . . . . . . . . . . . 35 RELAX!. . . . . . . . . . . . . . . . . . . . . . . . . 35 PCS-Linda. . . . . . . . . . . . . . . . . . . . . . . . . . 36 Linda Conversion Program. . . . . . . . . . . . . . . . 36 Tuple Elements. . . . . . . . . . . . . . . . . . . . . 36 Actuals - Integer/Real. . . . . . . . . . . . . . . . . 37 Integers. . . . . . . . . . . . . . . . . . . . . . . . 37 REALS . . . . . . . . . . . . . . . . . . . . . . . . . 38 Formals . . . . . . . . . . . . . . . . . . . . . . . . 38 INP & RDP . . . . . . . . . . . . . . . . . . . . . . . 38 EVAL. . . . . . . . . . . . . . . . . . . . . . . . . . 39 Tuple Size. . . . . . . . . . . . . . . . . . . . . . . 40 What Does OUT (); Produce . . . . . . . . . . . . . . . 40 An Example . . . . . . . . . . . . . . . . . . . . . . . . . 41 Analysis. . . . . . . . . . . . . . . . . . . . . . . . 43 Final Code. . . . . . . . . . . . . . . . . . . . . . . 44 Mandelbrot . . . . . . . . . . . . . . . . . . . . . . . . . 46 Graphics Screen . . . . . . . . . . . . . . . . . . . . 46 Sequential Program. . . . . . . . . . . . . . . . . . . 47 Parallel Programming Methods. . . . . . . . . . . . . . 51 Developer . . . . . . . . . . . . . . . . . . . . . . . 51 Columns . . . . . . . . . . . . . . . . . . . . . . . . 52 Results . . . . . . . . . . . . . . . . . . . . . . . . 52 Analysis. . . . . . . . . . . . . . . . . . . . . . . . 54 Worker. . . . . . . . . . . . . . . . . . . . . . . . . 55 Complete Code . . . . . . . . . . . . . . . . . . . . . 59 Disclaimer: Copyright 1991 Parallel Computer Solutions. All Rights Reserved The terms and conditions governing the sale of Parallel Computer Solutions, hardware products and the licensing of Parallel Computer Solutions, software consist solely of those set forth in the written contracts between Parallel Computer Solutions, and its customers. No representation or other affirmation of facts contained in this publication, including but not limited to statements regarding capacity, response time, suitability for use or performance of products described herein shall be deemed to be a warranty by Parallel Computer Solutions, for any purpose or give rise to any liability by Parallel Computer Solutions, whatever. No part of this document may be copied or reproduced in any form or by any means without the prior written consent of Parallel Computer Solutions. Information is subject to change without notice. Printed in the United States of America. Linda is a registered trademark of SCIENTIFIC Computing Associates, Inc. PCS-LINDA is a trademark of Parallel Computer Solutions Parallel Lan System is a trademark of Parallel Computer Solutions Turbo Pascal is a copyright of Borland International, Inc. Linda Linda is a parallel processing coordination language. To most computer scientists and engineers, this is a new concept. Commonly, sequential languages such as C and Pascal are enhanced to support parallel processing; Concurrent C and Parallel Pascal are two such languages. However, if we look at the actual code that makes up a parallel application we see that not all of the code is parallel. It can't be. If a parallel application was completely parallel, one instruction would be executed on each machine. This is not reasonable of course. Parallel programs are written such that certain pieces of code are farmed off to other processors while other code is executed on the host machine to pull all of the pieces together. Now there are machines where this is not true but we will not discuss those. These pieces of code farmed to other processor are typically small and perform a specific operation on some data. They are commonly called processes. Let's look at a common real world situation to put these ideas together. Large, busy offices in a corporation usually are made up of many different people doing some task or process. If we were to let this office run itself, it would become a mess in a very short time. Some people would be doing tasks that had already been done. There would be miscommunication. You can probably think of many more things that could go wrong. What this office needs is an office manager. An office manager isn is responsible for pulling the individual people into a working team. Each team member working for the greater good of the corporation. The Language Linda The same can be said for Linda. Linda is a language developed at YALE University and is copyrighted by Scientific Computing Associates, Inc. Linda however lacks the common features you and I commonly think of when we hear about a new language. There are no loops, decisions, records, etc in Linda. All of these common functions are provided in some other language used in conjunction with Linda. This language could be C, Pascal, Forth or any other language. The primitive functions of Linda are coded in a particular language and can be used just as any other statement can be used. Linda coordinates the activities of many different executing processes on different machines. As we will see later, these machines do not have to be a single parallel machine such as the Connection Machine. The Linda system we will be using coordinates the activities of different processes running on IBM PCs connected by an ethernet local area network. As we describe some of the characteristics and feature of Linda, we will introduce our version of Linda called PCS-Linda. There are features of Linda that can be difficult to implement on top of an existing language and compiler therefore, we have not implemented everything in our version. Linda Primitives It may be surprising to find out that Linda has just six instructions. These six instructions are in, out, rd, eval, inp, rdp. Each of which will be discussed in detail later in this section. First we must talk about how Linda coordinates processes. Linda uses a distributed data structure called a tuple. A tuple is a data structure that can be accessed by any process thus the distributed part. Tuples Tuples have two parts: a name and some number of elements. The name of a tuple is any combination of up to sixteen characters. The name serves as the primary matching characteristic for the tuples. ( name, e1, e2, e3, e4, e5, e6 ) The name can be either a variable ( string ) or enclosed in single quotes. The elements are filled from left to right, Elements A tuple can have from zero to six elements. Each of the elements will be a value or variable of a particular type supported in the host language Linda is coded in. In PCS-Linda the types can be integer, real, array, and record. Strings are not allowed in the current PCS-Linda system. Elements, in addition to having a specific type, can have a further description of actual or formal. Actuals When we speak of actual, we are concerned with an actual value. This value can be either the numerical representation or can be contained in a variable. When using a number as an element, we simply state the number in one of the element fields. ( 'example1', 1, 2 ) is an example of a tuple with name 'example1' and actuals 1 and 2, each of type integer. If we had two variables a and b of type integer, a, b : integer; and we wanted to use the values contained in these variables, we would precede each of the variable by the & symbol. The tuple ( 'example2', &a, &b ) would be an example of a tuple with name 'example2' and actuals a and b, of type integer. If this tuple was used in a program, the integer values assigned to the variables a and b would be compiled into the tuple definition. In addition to integers, reals or floating point values can be used as actuals. Thus we can have ( 'example3', 3.141592, 1.2e+08 ) can also be used as a tuple. In most cases, the only thing that differentiates an integer from a real number is the decimal point and therefore must be included when specifying a real number. Formals Formal elements are more like variables. They are used to collect a value from a tuple. They are distinguishable by the lack of the & symbol. A tuple with formal elements would appear as ( 'example4', a, b, c ) The variables a, b, and c would be assigned to values returned after a match has taken place on the tuple. Formal variables can be any of the previous data types mention except strings. Tuple Space Now that we have tuples, we have to have a place to keep them. This storage place is called the tuple space. The tuple space could be envisioned as a large bag full of tuples. The tuple space can be likened to shared memory in that all processors have access to the tuples in the tuple space. However, unlike shared memory, there are no critical sections in the tuple space. Any of the tuples can be used by any processor at any time. Now there are ways to control access to the tuple space by using tuples that emulate semaphores and the like. In order for a process to access a tuple, a matching has to take place. Thus a processor would sent a template of the tuple it would like to match in the tuple space. The machine holding the tuple space would perform a search on all of the tuple in the tuple space. As soon as a match is found, the matching tuple is sent to the processor that requested the match. The tuple space in our system is located on a single machine called a master. The master is responsible for holding tuples and matching tuples. When tuples are put into the tuple space, the master does not check for duplicate tuples. Any number of duplicate tuples can exist in the tuple space. There are specific rules that the master will follow when matching tuples. Tuple Matching RULE 1 : Tuple names must match both length and character for character. RULE 2 : Actuals match actuals if of the same type and contain the same value. RULE 3 : Actuals match formals if they are of the same type and length. RULE 4 : Formals never match formals. Let's look at each rule using some examples. Assume we have the following tuples in our tuple space. ( 'stuff', row, col, 1 ); ( 'coord', &x, &y ); ( 'work', &segments, &adder ); ( 'Junk', 1, 3, 5 ); Row, col, x, y are integers; Adder is a procedure; Segments is a record of length 18; We have to match the tuple ( 'stuff', 100, 200, start ) with a tuple in our tuple space. First off, RULE 1 is satisfied with the first tuple in tuple space because they both have names of length 5 and match character for character ( stuff = stuff ). The elements of the tuples are matched next. The tuple to be match has an actual element of type integer in the first position and value 100. The first tuple in tuple space has a formal of type integer in the first position. By RULE 3, the first elements match in each of these tuples. Further study shows that the second and third elements match as well. Further Examples The tuple ( 'stuff', 3.4, 200, start ) does not match the tuple ( 'stuff', row, col, 1 ) because the types of the first element are different. The tuple ( 'stuff', &a, &b, &start ) may or may not match the tuple ( 'stuff', row, col, 1 ). A determination cannot be made because we do not have a value for the variable start. If start equals one, then we have a match but if start equals any other integer, the tuples do not match. Note on real elements As we all know, computers have a tough time representing real or floating point numbers well. Some numbers can be represented exactly such as 0.5 but how does a computer represent the value of 2/3. At some point, the computer will have to round to 0.666666667 which is not correct. This inaccuracy must be kept in mind when using real number as element types in tuples. If during a calculation, a real number is used in a tuple, there may not be a match because of rounding. Thus it is probably wise to avoid matching on reals if at all possible. In most cases, you will probably get a match but that one time when you don't, keep the above in mind when looking for the problem. Linda Instructions Now that we have tuples and a place to put them, we must look at the individual Linda instructions and determine what each one of them does and how to use them. This section will be ended with the common 'Hello World' problem coded in Linda and Pascal. The first four Linda instructions discussed are the most common ones used. The last two are variants of two of the common ones. IN The IN instruction is used to request a match on a tuple from tuple space. The format of the instruction is IN ( name, e1, e2, e3, e4, e5, e6 ) Only the elements necessary are included in the tuple. The tuple specified with the IN instruction is sent to the master for a match with a tuple in tuple space. If there is a positive match, the master will return the matching tuple. If there are any formal elements, they are assigned the appropriate values from the matching tuple. If there is not a tuple in tuple space that matches the tuple sent by the IN instruction, the master keeps the tuple and continually check the tuple space for a match. In the meantime, the processor that sent the IN instruction will block execution until a matching tuple is sent. Once that master has a match, it is sent. When the master sends the matching tuple to the requesting processor, it is permanently deleted from tuple space. If there are multiple copies of the same tuple in tuple space, the first one is taken. The master will take the first tuple that it determines matches the tuple sent with the instruction. RD The RD instruction is somewhat identical to the IN instruction. The format of the RD is RD ( name, e1, e2, e3, e4, e5, e6 ) The RD instruction sends a tuple to the master for a match with a tuple in tuple space. If the master has a match, it sends a copy of the tuple found in tuple space. If the master does not have a match, it will continually search tuple space for a match. Meanwhile, the processor that sent the RD instruction will block execution until a tuple is returned. When the master finds a tuple, it sends a copy of the tuple to the requesting processor. The tuple is not deleted from tuple space. This allows a single tuple to be read from any processor without the overhead of INing the tuple and OUTing the tuple just to get a copy of the tuple. The RD instruction should be used when communicating common data to a number of different processors. OUT We have seen how to request tuples from tuple space but how do we get tuples into tuple space to begin with. The OUT instruction enables us to put tuples into tuple space. The format of the OUT instruction is out ( name, e1, e2, e3, e4, e5, e6 ) The tuple with the OUT instruction is sent to the master and is put into tuple space. Any number of the same tuples can be put into tuple space. In all cases of actuals, the actual values are sent to the master in the tuple. There are no pointers referencing data in a different processors memory. The master does not respond to the OUT instruction. It is simply taken for granted the tuple is put into tuple space. Once an OUT has been performed, any processor can access the tuple including the processor that issued the OUT instruction in the first place. EVAL The EVAL instruction is the most important of the Linda instruction. Without it, the other instructions are of no use. The EVAL instruction is used to put code into tuple space. This code is picked up by workers and executed. The format of the EVAL instruction is eval ( name, e1, e2, e3, e4, e5, e6 ) In PCS-Linda, the eval instruction has not been fully developed to the specification of the original Linda EVAL. In PCS-Linda the format of the EVAL instruction is eval ( 'work', &procedure ); All EVAL instruction must name the tuple 'work'. Because tuple with the same name can reside in tuple space at the same time this is not a problem. The first element of the tuple will be an actual designated by the & operator and the name of the procedure that is to be put into tuple space. So if we wanted a procedure called MANDEL to be put into tuple space to be executed, the following instruction would be used eval ( 'work', &mandel ); INP The INP instruction is identical to the IN instruction except for one condition. If the master finds a tuple match when the instruction is received, it will return the tuple. If a tuple is not found in tuple space, the master returns a null tuple to the requestor. Thereby allowing the INP instruction to be evaluated as either true or false depending on whether or not a tuple was matched from tuple space. PCS-Linda implementation information of this instruction will be given later. RDP The RDP instruction is identical to the RD instruction except the master does not continually search for a match if its first attempt in unsuccessful. If a tuple is found, the matching tuple is returned to the requestor. If the master does not find a matching tuple in tuple space, the master will return null. Thus allowing the RDP instruction to be evaluated as either true of false depending on whether or not a tuple is returned. Those are the six instructions that make up the Linda coordination language. Let's look at an example of a 'Hello World' program using Linda and Pascal. Hello World Example We will assume we are working on a system with 8 worker processors. The beginning of our program would be standard Pascal. program hello; const num_proc = 8; Next we have to write the procedure for the workers. procedure world; var count : integer; begin in ( 'count', count ) inc ( count ); out ( 'count', &count out ( 'hello', &count ); end; Now the main procedure. begin out ( 'count', 0 ); for i := 1 to num_proc do eval ( 'work', &world ); for i := 1 to num_proc do begin in ( 'hello', proc ); writeln ( 'Hello from processor', proc ); end; in ( 'count', 8 ); end; The program begins with the tuple called COUNT being put into tuple space with an integer actual of 0. This is followed by eight copies of the WORLD procedure; one for each worker. The main program goes into a loop requesting a HELLO tuple one at a time. Not that the first element is a formal, thus we are only matching on the name of the tuple HELLO. The formal element will have some value on each loop iteration. The numbers my not be in order. As each tuple is found in tuple space, a message is printed that says Hello from processor -. The code ends with an IN which cleans up the tuple space. The workers are instructed to IN the COUNT tuple, increment the number it finds in the first element position and put the tuple back in tuple space for the next processor. After it does this, it puts a tuple in tuple space called HELLO with the number it put into the COUNT tuple. Each worker will get a different number to report back to the main code. Now as we stated above, this code will receive eight tuple with the name HELLO is any order. If we changed the main programs loop slightly, we could guarantee to get the HELLO tuple in order from 1 to 8. for i := 1 to 8 do begin in ( 'hello', &i ); writeln ( 'Hello from process - ', i ); end; We have changed the first element from a formal to an actual. Thus instead of the master having to match the name, it must match the first element as well. Therefore, the it will return a HELLO tuple only when the first element is a 1. Parallel Lan System Now that we know how to program Linda and we have seen a parallel program, we need to begin doing some real work. So now we can sit down to our PC and begin programming the examples. Well not exactly. Our common PC is a single processor machine. We have no way of dividing up work and allowing different processors to work on the pieces. We have two options. We can purchase an expensive parallel machine for several thousands of dollars. Have it installed and teach ourselves the things necessary to program the machine. Or we can use the Parallel Lan System. The Parallel Lan System is a software package that allows IBM PC or compatible machines to operate as a parallel processor. The machines must be connected together by an ethernet local area network. A minimum system consists of three machines: a master, worker, and a developer. Using Turbo Pascal and the Parallel Lan System ( PLS ), a parallel program can be constructed for any parallel algorithm we so desire. In addition, we have a version of Linda called PCS- Linda that we will use to coordinate the activities of the different processes created our the parallel program. The PLS was created to co-exist with other network products on the market such as Novell Netware and DECNet. The system will not interfere with NCSA or PCSA or any of the TCP/IP programs. The Parallel Lan System can even be configured to run several parallel programs on the same network sharing the same processor of the system. Configuration The configuration of the Parallel Lan System is very important to the efficiency of the parallel system. The remaining sections will document the different components of the PLS. Several examples will be presented in the end of the guide. Requirements The Parallel Lan System operates by using an ethernet local area network ( LAN ). LANs are very popular among educational instructions and businesses. Various packages are available to run on networks including Novell Netware and MS Lan Manager. The Parallel Lan System communicates on an ethernet by way of a packet driver. Most ethernet card manufacturers have these drivers available at no cost. In addition, there are public domain packet drivers available for most cards. Packet Drivers Packet drivers are terminate and stay-resident ( TSR ) programs which act as an interface between a developer and an ethernet card. Ethernet cards use a hardware interrupt of a PC. When this interrupt is activated, the packet driver code is activated to perform some function. Likewise, the Parallel Lan System software is able to activate a software interrupt which also activates the packet driver code for its own use. Software can be written that locates the packet driver installed in a machine thus allowing an PC and ethernet card to be used without changing the system software. Manufactures who provide packet drivers will also include information on how to install them. Packet drivers can be used with most network packages. An advantage of using packet driver is multiple applications can access the same ethernet card. The packet driver has the ability to give a packet from the LAN to one application or another based on a type field located in the packet. What we want to do now is install the packet drivers for the machines the system will run on. Follow the instructions given by the manufacturer. With that, we want to verify that everything to this point is operating correctly. On the disk labeled system software is a file called PKTINFO.EXE. When executed, this program will give us information about the ethernet card and packet driver installed on a particular machine. After the packet driver has been installed in a machine, place the system disk in driver A: and type A:PKTINFO If a page of information appears on the screen, the packet driver is working correctly. If a message appears saying no packet driver was found then the packet driver was not installed correctly. After the packet drivers have been verified we want to check the LAN itself. Located on the same disk is a program called TRANS.EXE. This is a simple LAN communication program that will send and receive packets from on PC to another. Pick two machines to execute this software on. One one of them, write down the ethernet address as shown by PKTINFO. Inset the system disk in drive A: and type A:TRANS Take the disk out and do the same for the other machine. On the PC that will receive the packets press SHIFT and R. We now want to select what receive mode to use. Press 2. The screen will blank and a status bar will appear at the top. On the sender machine press SHIFT and S. We need to select a receive mode as we did for the receive machine. The reason is because the receiving machine will send an acknowledgement packet back to the sender. So press 2. We are now asked if we want to use the broadcast feature. Press N. The program is asked for the address of the receive machine. Enter the six bytes written down separated by spaces and press return. Enter a message to be sent to the receiving machine and press enter. The screen will blank and ask you to press a key to send a packet or shift Q to quit. Press any key. If you look at the receiving machine, you message should be on the screen. On each of the screens, there should be a 1 in the upper right hand corner. This indicates that one packet has been sent and one packet has been received. You can continue to press any key to send additional packets. Press SHIFT and Q when ready to quit. Master The master of the Parallel Lan System is the most important machine. It can be likened to the server of a local area network. It must be powerful and able to handle a flood of activity by the worker nodes and the developer. As documented in the manual Using the Parallel Lan System, the master should consists of * At least a 25-MHZ 386 IBM PC or Compatible. * 4 MB of RAM * 16-bit 32k buffer Ethernet card Anything above these specification will allow for better efficiency in the system. Other than running the system software for the master, there is nothing that can be done to the master. Duties The duties of the master are as follows * Administer the MAIN tuple space of the PLS. * Receive and interpret Linda instructions from multiple processors. * Keep track of unanswered Linda instructions (IN,RD). Those are the only responsibilities of the master. However it is a larger responsibility than it may appear. Multiple Processor Communication When considering the concept of multiple processor communication, image this situation. You are sitting in the middle of a ring of sixteen people. All of these people are trying to hold a conversation with you AT THE SAME TIME. The human mind simply cannot handle that much information at the same time. Most people can't even keep track of a single conversation. That is one of the jobs of the master. Things begin with the ethernet card receiving a packet. This packet is put into some memory set aside for incoming packets. The master system software periodically checks to see if there are any packets in the first incoming buffer. If there are packets, it will take as many as there are and partial processes them and put them into a secondary buffer. This step is crucial. The first buffer is static. In other words, it is of a constant size. As much information about this buffer as possible was defined when the master program began executing. If it were to overflow, some packets would be lost ( the sender would send duplicates however. But at a lose of execution time ). After the packets are in the secondary buffer, the system software will processed them fully one at a time as it has time. Processed packets will end up in one of two places. Packets which represent the Linda instructions OUT and EVAL windup in the TUPLE SPACE. Packets representing IN, INP, RD, and RDP will end up in a request queue. The master system software picks from the request queue when it tries to match tuples. In addition to the above queues, the master also has internal data structures that it uses to guarantee that no duplicates packets are processed. We all know what its like to be told the same thing over and over. The master doesn't like it either therefore it eliminates duplicates. Also, the master must keep track of the order of packets from different processors. Just like we try to say a persons first name before their last name, the master must guarantee that a packet sent before another packet arrives first. For more information on the about technical data about the system software, refer to the document The Parallel Lan System -A Look Inside. Different Masters There is a difference in performance between different masters. We are going to take a look at how masters can influence the overall speed of a parallel program. The masters used are * a 4.77-MHZ 8086 IBM PC Compatible, and * a 25-MHZ 80386 IBM PC Compatible. Each of masters was used in a comprehensive set of 40 tests using the Linda mandelbrot program explain in a later chapter. The test documented here was performed using from 1 to 16 workers ( 4.77 MHZ 8086 PC's using 8087 math coprocessors ) each doing 4 columns per computation. The following table shows the difference between the masters. Cpus 8086 Master 80386 Master Speedup 1 642.52 577.91 11% 2 358.89 274.89 24% 4 210.80 141.65 33% 8 188.72 80.36 58% 16 ----.-- 88.87 ----- The chart shows the total seconds for each test using the 8086 master and the 80836 master. The far right column shows the speedup between the two systems. There is a considerable speedup as we approach 8 workers. There was no test performed for 16 workers using the 8086 master. Notice the increase in time between 8 and 16 workers. This example also illustrates the idea that adding more workers does not necessarily decrease execution time of a program. A faster master does indeed make a considerable impact on the speed of the execution of the program. Machine The faster the machine, the faster the tuple space can be searched for needed tuples. The most work the master will perform is interpreting tuples sent to it. The faster it can do this the better. DOS Microsoft recently released DOS 5.0. This DOS has many memory saving features that the system can benefit from. IF the master is equipped with 1 MB of memory or more, DOS can be loaded into high memory as well as device drivers and TSR's. All of this can save precious lower memory. The master software uses lower memory ( as well as extended ) to hold the tuple space. A system using DOS 3.3 or 4.01 will have approximately 300k of heap space available for the tuple space. Dos 5.0 increases this amount to approximately 360k. Extended Memory If we have 1 MB available we can take advantage of the features of DOS 5.0 Any memory over 1 MB can be configured as extended memory by using an extended memory manager. The master software will take advantage of any extended memory available. Using extended memory for some of the tuple space is expensive as far as processing speed is considered. The software uses a system put into public domain to manage extended memory. Unlike conventional memory below 640k, extended memory cannot be accessed directly. A system of segments is created in extended memory. These segments are copied into conventional memory to be manipulated and then put back. The segments are 32k in size. This means there is a 64k memory copy performed for each and every packet that must be put into extended memory. Now simple memory management has been put into place which will keep the most recently used extended memory extent in conventional memory. However, to execute programs which has a large amount of data, extended memory is essential. At Parallel Computer Solutions, a 4 MB system has been used in all cases with no problem as far as space requirements. Network Cards Not all cards are created equal. We have 8-bit, 16-bit, and 32-bit cards available for PCs. To the best of my knowledge, 32- bit cards are available for EISA bus system only. Therefore, most PC's will use either the 8 or 16 bit cards. If we have a 80286 or better machine, we will want to use 16-bit cards. 16-bits ethernet cards have several advantages. The first is the addition of 8 data lines. More information can be transferred on 16 lines than 8 lines. A second reason is buffer size. Most 8-bit ethernet cards have an 8k buffer for incoming packets. For most applications this is fine but the master will be receiving packets from many different PCs at the same time. We want to have a large buffer available if the master is busy searching tuple space or some other system function. 16- bit ethernet cards have either 32k or 64k buffers for incoming packets. You usually get what you pay for when buying a computer system. The master is the most important component in the Parallel Lan System and therefore it should be the best system available. Master Installation Obtain a blank diskette to copy the master software from the system diskette. To make the system easy to install, create a bootable diskette with the appropriate memory managers, etc for you system. Copy the packet driver for the ethernet card onto the new diskette. Copy the file MASTER.EXE to the net diskette. Label this disk as bootable and containing the master system software. Execute the master software on the appropriate machine by typing A:MASTER A message will appear telling how much extended memory is available and used by the master system. Press return to get to the next screen. The following should appear ( master screen - initial picture ) This is the initial screen for the master. The master sits in a loop waiting for a tuple to be sent to it. Under normal circumstances, the master does not need any attention. The operation of the master can be disabled by pressing any key. As activity starts on the system, the screen will change, this screen ( master screen - 2 cpu picture ) shows that 2 cpus are active on the system and the packets sent and received from those cpus. In addition, the sizes of the heap and extended memory is given. Worker The workers are an important part of the Parallel Lan System. The workers do one thing: perform calculations. The least powerful machine available for a worker is: * 4.77 Mhz 8088 IBM PC or Compatible * DOS 3.3 * 640K RAM * 8-bit ethernet card Just as in the case of the master, the performance of the Parallel Lan System can be determined by the power of the worker machines. Testing of the Parallel Lan System was performed on both 4.77 Mhz 8088 workers and 20 Mhz 80386 workers. Both systems provided speedup simply because of the parallel execution of the test program. But the 80386 workers gave additional speed advantages by as much as 60/80%. The following four charts show the times required to perform mandelbrot using 8088 workers and 80386 workers. The master in the first two charts was an 8088 machine. In the second two charts, the master was an 80386 machine. ( charts ) In an established LAN we do not have much choice in what type machine is used as a worker. The charts should show you that processor power certainly makes a difference in both the master and workers. Worker Installation We must now install software on each of the worker machine. Follow the previous directions for setting up a master diskette but instead of copying master.exe we need to copy a worker program. Turbo Pascal is available in two different versions: 5.5 and 6.0. They are different in code generation. Therefore, we have two different worker pieces of software called worker5.exe and worker6.exe. If you are using Turbo Pascal 5.5 then copy worker5.exe to a:worker.exe and if you are using Turbo Pascal 6.0 then copy worker6.exe to a:worker.exe. Insert the diskette into driver A: and type A:worker The screen will blank and you will be asked the address of the master. If you did not write the address of the master down, you can look on the screen of the machine it is executing on. In the upper left hand corner of the screen is the ethernet address of this machine. On each of the worker systems, enter the six bytes separated by spaces. Once the system has digested the master address, it will attempt to establish communications with the master. Recall that the purpose of the worker is to execute code given to it by the developer system. The worker will get work from the master by sending a tuple of the form in ( 'work' ); to the master. All packets sent over the ethernet are given a specific number in order to keep a sequence. the system software gives the number 2 to the first packet sent out by any system. We can verify that a worker is communicating with the master by the messages put on the master and worker screens. A worker has communicated successfully if on the screen of the master is a message IN - 2 for each of the workers. Since a packet has been sent to the master, the master has to acknowledge the packet, therefore each of the workers should have a message in the right most box with the number 2 next to it. If this is the case, then this worker has been successful added to the system. If this is not the case, then either the master's address was not correctly entered or the address in not that of the master. The worker software halts when a key is pressed. So if no message appears on the screen , press a key on the worker machine and try again by executing the worker software again. Developer The developer is a machine on the system which runs and coordinates the executing parallel program. It will begin a program by submitting any number of worker processes. The responsibilities of the developer are * Submit jobs for worker processors * Coordinate the submitted jobs * Produce results The developer is the foreman for the Parallel Lan System. A programmer will code a parallel program on the developer, compile it, and run it without moving to different machines. One way to look at it is you are programming a single PC which just happens to have any number of subprocessors available to help speed up a particular program. Again it needs to be pointer out that if you are using Turbo Pascal 5.5, the workers should be executing worker5.exe and if you are using Turbo Pascal 6.0, the workers should be executing worker6.exe. The Parallel Lan System developer software is setup to initially recognize files with an extension of .PLS. Turbo Pascal and our system files can be set up in such a way as to simplify use of the system. In the root directory of you hard drive create a directory called PLS using the command mkdir pls Move into that directory with cd pls Insert the system software diskette into drive A: and copy the following files into our new directory work*.tpu both*.tpu *.pls linda.exe *.doc Now back out of this directory with cd .. Using an editor change your autoexec.bat file to include the turbo directory in the system path. If no PATH directive appears in you autoexec.bat file, enter the following line path c:\tp - or whatever the turbo pascal directory name is After you have changed the file type autoexec This is make the change effective. Now change into your PLS directory and type turbo linear.pls Turbo's main screen will appear. We need to make sure the compiler settings are set correctly before we do anything else. Press ALT and O, move to the COMPILER and press return. Move the bar to FORCE FAR CALLS and change the entry to YES. Check that the entry BOOLEAN EXPRESSIONS is set to SHORT-CIRCUIT. Move to the first menu and select the SAVE OPTIONS entry and press return. Save the new options. Now on the screen will be the parallel program for solving linear equations. Try to compile this program by pressing F9. You should get an error on the first Linda command IN. What we need to do is execute the conversion program LINDA.EXE. Linda.exe is explained in the document, Linda Conversion Program User Guide. To do this in a convenient manner, press ALT and F. Move to the entry EXIT TO DOS and press return. This will exit us to DOS but keep Turbo Pascal in memory. Run the conversion program by typing linda linear The linda program will create a file called Linear.pas that is made up of Turbo Pascal statements only. Type exit and press return. This will return us back to Turbo Pascal. We are still working on the file Linear.PLS. Press ALT and F and move the bar to PICK and press return. Pick a new file and enter linear.pas. This will bring up the file linear.pas. Press F9 to compile the program, it will compile successfully. If there had been an error, we would have wanted to press ALT and F, move the bar to PICK and select linear.pls to make any changes, run linda again, and select linear.pas. Although this may seem a hassle, it is the only way to incorporate a preprocessor in the Turbo environment. Future releases will not require this. Compiler Directives During the development of the system, several things were learned about how Turbo Pascal ( TP ) generates code. TP allows for several different types of float point variable. Reals, double, extended and several other are the chooses that we have. In most applications, real variables are good enough for our calculations. TP will generate code for reals automatically. In order to generate code to handle double or extended variables, the compiler directives {$N+,E+,F+} are necessary. So what the problem. The problem comes about when there are constants in your code. Such as a := 1.0; The 1.0 is a constant as far as TP is concerned. When TP generates code for normal reals, it appears that the constant value is stored in the code itself. When double or extended reals are used, TP does not put the constant into the code itself. The constant value is stored at the top of the procedure that the constant appears in. The code procedure test; var a : extended; begin a := 1.0; end; TP generates the code. 0000803F - 1.0 55 - push bp 89E5 - mov bp,sp B80A00 - mov ax, 000A 9A7C02AF5C - call 5CAF:027C 83EC0A - sub sp,000A CD3C99060000 - fld cs:dword ptr[0000] The mnemonic FLD loads the real constant into the 80x87 or emulator. The constant is located at cs:dword ptr[0000] which when disassembled, is the first line of this code segment. The main code has a call statement CALL 0004 which calls this procedure code thus bypassing the real value. Now what is means to us is, we should always include the compiler directive {$N+,E+,F+} in our program. It is not always needed obviously, but it is safe to include it. If you write an application that uses reals, there may be an error. The system code has been tested on machines with and without numerical coprocessors. One test included some workers with coprocessors and some without. The above compiler directive worked for both setups. Units and System Code There are a considerable number of routines that are needed to perform the operations of the Parallel Lan System. By incorporating these routines into a single Turbo Pascal Unit, we are able to successfully hide them from the developer. Turbo Pascal requires units to be included in a program by using the command USES. Programs intended to execute on the PLS should start with the code: Program .......... {$N+,E+,F+} Uses work, both; There are two units for the PLS; work and both and should be in the directory where the program being written is. By USEing this unit, we have access to procedures and variables that we will use directly in the writing of our parallel programs. Now in the working directory, there are two different version of the work and both units. Work5 and both5 should be used when using Turbo Pascal version 5.x and work6 and both6 should be used when using Turbo Pascal 6.0. Startup Procedure We have developed a parallel environment which operates in coordination with Turbo Pascal. Because of this, there are several problems that must be dealt with directly. To solve several problems, a parallel program must include, what we call, a startup procedure. The purpose of this procedure is two-fold. First is to initialize the system. The second procedure of the startup procedure is to border procedure which are destined to be sent to worker processes. Turbo Pascal creates, as far as I can tell, procedures by first coding all real constants and placing them into the code. So if a statement in the language like a := 1.0; Turbo Pascal will encode the 1.0 into the code such as 0000 3F55 ( or something like this ) After the constants, code is generated to set up the stack for any formal or local variables. push bp mov bp,sp mov ax, ---- ( the number of bytes required for variables ) call ---- There one or two calls to Turbo Pascal procedures. It is my guess that the stack is checked for overflow as well as other housekeeping activities associated with the stack. The code to perform the function so the procedure are generated next. After this code, the stack is returned to its original state. The last code generated is a return instruction. The 80x86 family of microprocessors have several different return opcodes. These return opcodes are for near, far, and interrupt routines. Our system requires all procedure and function calls to be generated as far. The reason for this is so Turbo Pascal will always generate a RETF instruction which us $CB hex. By always generating this particular instruction, the system software can make a fairly good guess at where a procedure ends and another begins. This, if we wanted to generate code for the procedures. procedure startup begin end; procedure tosend; begin end; The code generated would look something like ( procedure startup ) $55 push bp mov sp,bp mov ax,---- call . . . $CB retf ( procedure tosend ) $55 push bp mov sp,bp mov ax,---- call . . . $CB retf If our code was going to send the procedure tosend to a worker node, we would grab the starting location of the procedure, say 0030 hex, by using the @ operator of Turbo Pascal. The system code backtracks until it finds a $CB value. What we are doing is looking for any real constants. The memory from the start of a procedure back to the next procedure's end $CB is copied to the end of the code we are sending to the workers. Now why the startup procedure. If we did not have a procedure coded before the procedure we are sending to the nodes, the system could search for a long time before another $CB is encountered. By incorporating the startup procedure, we are guarantee to find a $CB. The turbo debugger invaluable when trying to determine exactly what Turbo Pascal does during code generation. The startup procedure should be included in the developer code right after the compiler directives. Program ............... {$N+,E+,F+} Procedure startup; begin end; Now that we have the basics of the startup procedure, we need to put some code into it. The first two lines of the procedure are exitsave := exitproc; exitproc := @myexit; These lines of code link our exit procedure into the system exit procedure which is part of any Turbo Pascal program. In the event of a run-time error, our exit routine will do some internal memory deallocation and other system functions that allow the system to fail gracefully. The next size lines of code identify which machine is the master. Ethernet addresses are codes as six bytes which as 00 00 C0 05 86 24 hex. Find out the address of the ethernet card the master system software will execute on using the pktinfo program included with the system. The lines of code necessary to identify the master to the developer program looks like master[x] := $yy; There should be six lines identical to the above with x ranging from 1 to 6. The $ operator identifies the following two characters are hexidecimal. The last line of code in the startup procedure is init_system; The system initialization code has been put into this single call in order to keep the startup procedure as simple as possible. Procedure Declarations The power of the Parallel Lan System lies in the ability to send procedures to worker nodes. The system is not limited to just sending of a single procedure. Any number of procedures can be sent to workers. There are several rules that have to be followed when designing procedure which will be sent to workers. Global Variables When Turbo Pascal compiles a program, space is set aside in a separate data segment in the PC's memory for variables. The memory location of a specific variable is recorded in the compiled code. If a global variable is reference in a procedure sent to a worker, the value obtained when this variable is used will not be the value that we want. The contents of the memory location referenced will be undetermined because the worker was not aware that of that particular global variable. Worse yet is the assignment of a global variable. The value assigned to the variable will be placed in the workers memory somewhere. It is possible that the assignment will cause the worker to fail. Global variables should be passed by the developer to worker machine through the Linda instructions and the tuple space. Readln/Writeln Procedures sent to workers cannot have readln or writeln statements in them. The reason for this is Turbo Pascal treats the screen and keyboard as files. If used, a FILE NOT OPENED error will appear and the worker will fail. Formal Parameters There can not be formal parameters in the procedures sent to workers. Because we are using a version of the Linda coordination language, there is no need for formal parameters. The function of formal parameters are performed by Linda instructions. Nesting Procedures Turbo Pascal does not follow the source code exactly when generating code. One such case is nesting procedures. Logic would dictate that the code generated would model that of the source, where by the procedure code generated by the compiler would include procedure nesting. Turbo Pascal does not do this, nor should it. Nested procedures are coded as single procedures and a call in made to them as needed. This however causes us some problems. When our system generates tuples to send to workers, it is performed at runtime, therefore we do not have access to the symbol table. We cannot determine which procedure to send and which not to. The end result is we are unable to handle nested procedures with this release of the system. Multiple Procedures There are programs which are prone to decomposition requiring multiple procedures. The system is setup such that procedure sent to workers can coexist with normal procedures. The developer can use normal procedures during the execution of a program. When a program has more than just a single procedure to be sent to the worker nodes, order is not important. The system will locate the appropriate procedure when asked for. Say we have a program which requires two different procedure to be sent to workers. Can the developer program or programmer determine which worker will get which procedure? No, we can not determine in what order the workers will get the procedures. If it is important, a system of semaphores could be used. CAUTION! There is one important note that must be made about the code that can appear in a worker procedure. Turbo Pascal performs smart linking when ever a program is compiled. In other words, if you use the sqrt library function, it is extracted from the system unit and placed in your code. The entire system unit is not compiled with your code. Therefore, STATEMENTS FROM THE SYSTEM UNIT CANNOT BE USED IN YOUR WORKER PROCEDURES! The reason for this is the worker software cannot effectively be compiled with all of the functions included in the system unit. Once again, this is a limitation brought about from building this system on top of the conventions of a pre-existing compiler. Developer Program Design Parallel programming is no easy nor is it straightforward. In this section, I would like to give a few points on understanding the task at hand. This is by no means a tutorial on parallel programming. All parallel system are different. Then again so are all programming languages. Bit as computer scientists, we are required to be able to learn new programming languages easily. Learning to program the Parallel Lan System can be an uncomplicated task if the problem we want to solve is correctly decomposed. The Parallel Lan System is designed as a distributed structure machine. The information or data structures we develop for our problem are kept on a single machine, the master. In order to access this information, a request is sent to the master by way of a tuple. There can be potentially a large amount of information passed among the different systems. The designer of a parallel program on the PLS must keep this i mind. Take for instance on the of the example programs described in the main documentation. Mandelbrot is typically designed as a sequential program which executes on a single pixel at a time. When writing Mandelbrot for a parallel machine, several options are available. Do we want each processor to execute the algorithm on a single pixel before returning result, a column, a row , or multiple of each. The main documentation describe the result . But suffice it to say, executing on a single pixel at a time will achieve parallelism because some number of processors will be calculating a pixel value at any given moment. Bit is not optimal. There is too much communication between processors. We can achieve better results with a particular number of processors by increasing to a column or more. This brings up another point, there are times when adding processors to the system will result in a degradation of the system as a whole. Tuning There will almost always be some parameters in a parallel program that will judge the ultimate speed of the program on the system. It must be kept in mind that the key to parallel programming is to get an initial program up and running. After the program is running, test it and compare to other platforms. A sequence program executing on a single processor A one - worker Parallel Lan System A two - worker Parallel Lan System A four - worker Parallel Lan System An eight - worker Parallel Lan System Graph the results ( using execution time ) on a chart. This will help illustrate the speed up achieved from going to parallel execution. Do not be surprised if several of the tests are slower than a sequential program. At some point, adding more processor should achieve results faster than the sequential program. A rule of thumb can be used in general. If after executing the algorithm on a four processor system the results are not achieved faster than a sequential program, something is wrong with the design of the program. But before submitting to a redesign, check the establish or try to establish tunable parameters. There are always some variables or aspects of the algorithm used in the program which can be tuned to achieve faster results. The pixels was a tunable parameter in the Mandelbrot program. In a fiance program it could be the number of variables used in a regression or the number of data points examined. In the linear algebra program shown in the main documentation, its the number of columns calculated by each processor. Debugging A parallel program can sometimes be more of a challenge than writing the parallel program itself. As of this release there are no debugging facilities, but the source code is provided so you can add any support you feel necessary. My only remark on this subject is to run your program on a one worker system and count out pieces of the code until you have identified where the problem occurs. Then run on a two worker system and do the same thing. Just remember the you have many different tasks executing at the same time. A key to debugging a parallel program is to understand the relationship between these different tasks. Main In the main section of the developer program, the first statement should be a call to start_up. Your code for the actual program goes next. After the code, the statement close_system ends the main section. begin start_up; { user code } close_system; end. Skeleton Code The following is a skeleton of the necessary code for a developer program. It should reproduced and placed in a .PLS file for easy access. The easiest way for keeping the code separate is to name this code skel.pls and perform a copy when developing a new application. program ------; {$N+,E+,F+} uses both, work; { remember: append 5 or 6 to the end of work and both depending on the version of tp } procedure start_up; begin exitsave := exitproc; exitproc := @myexit; master[1] := $--; master[2] := $--; master[3] := $--; master[4] := $--; master[5] := $--; master[6] := $--; init_system; end; { user procedures } begin start_up; { user code } close_system; end. What the Developer Doesn't Show The developer is an independent program. You will not see the same displays as the master and worker have. Therefore, there should be some activity on the developers screen. The programmer will have to provide this activity. The example programs are all graphical in nature therefore it is quite easy to see that everything is going normally. The programmer will want some kind of indication from the developer as to the state of the system as far as the developer is concerned. RELAX! The most important thing to remember when developing parallel programs is to relax. Parallel programming takes time and patience. Parallel programs have to be planned, in my opinion, much more than sequential programs. The results for this is that in a sequential program, one instruction is going to follow another. Even in the case of decisions and loops. There is no question about it. It can be traced and reproduced many times. ( Unless there is a subtle bug in the program that occurs only every three thousand iterations ). In a parallel program, there is absolutely no guarantee as to the order that instructions will be executed by the different machine. Even if the workers are executing the same piece of code. I strongly recommend coding the programs presented later and running them. Get comfortable with the system and then experiment. You can always reset the machines and start the system again if something really messes up. PCS-Linda In the above section on LINDA, we touched on the different instructions and either use in coordinating parallel programs. Now we want to look at PCS-Linda. PCS-Linda is the dialetic of Linda that the Parallel Lan System is programmed with; along with Turbo Pascal. In order to get Linda is execute using a pre- existing compiler, several requirements had to be made as far as using Linda was concerned. This section will discuss those requirements. Linda Conversion Program Turbo Pascal will not recognize PCS-Linda. Try it once. You will get an error message on any of the six instructions. The Linda Conversion Program ( LCP ) is a preprocessor for the PLS and PCS-Linda. After you have coded a parallel program using Turbo Pascal and PCSL, you must give the program a name ending with the extension '.PLS'. Once this file is created, you can run the LCP to convert your program. LCP will perform some magic and produce a file with the same name ending with '.PAS'. This file can now be successfully compiled with Turbo Pascal. The LCP will produce code that Turbo Pascal can recognize for each of the linda commands. If there is an error in the Turbo Pascal code, verify that all variable have been declared and things like this. Always remember to make changes to the '.PLS' file and not the '.PAS' file. For more information about LCP, refer to the document Linda Conversion Program. Tuple Elements Tuples in PCS-Linda can consist of zero to six elements. The elements can be one of four different types * Actuals * Formals * Data * Null Actuals and formal elements will be discussed next. The last two types, the programmer and user have no control over. When an EVAL instruction is executed, a procedure is sent to the master or worker. This procedure is considered to be a data element. In reality it is an actual but there are special needs that must be met when searching and transferring procedure to and from the different machines. The data type allows for cleaner code. The null type is used in every tuple where there is less than six elements. Elements not used in a tuple are given the type null. The null type uses no memory therefore it is a convenient form of termination for these empty elements. Actuals - Integer/Real Using a pre-existing compiler caused several different things to happen in the development of the Parallel Lan System and its associated Linda language. This can be seen when using actuals and formals in the tuples. Integers Turbo Pascal has several different types of integers; integer, long, word, byte. Each of these can be used in a tuple but there is a rule to their use. If an integer is to be sent in a tuple, it can be represented in two ways. The first is to put the integer into the tuple ('info', 1, 5, 8) The 1, 5 and 8 will be interpreted as integer and stored accordingly. The integer used in the two-complement two-byte integer. Therefore the tuple ('info', 1234567) would case a compile error when the final code was compiled. This number is too large for an integer stored in two bytes. The same is true if an single byte was to be sent. It would be stored in two-bytes instead of one. In order to sent the above number, we could use the tuple bigint : long; bigint := 1234567; ( 'info', &bigint ); This tuple would effectively send the large number to the master. We would do the same for a byte smallint : byte; smallint := $32; ( 'info', &smallint ); Remember that the tuple sent to match either of the above tuples would also have to have the receiving integer declared as either a long or a byte. REALS The same situation as above occurs with reals as well. When a real is put into a tuple ( 'info', 1.234, 1.4E+3 ) The real is evaluated as a 6-byte real number. This is an acceptable real number for most applications but in the case of more complex mathematics, a large real may be necessary. Turbo Pascal support reals as either real, single, double, extended, and comp. Each of these tpus can be used just as we did for integers. To send the real number 1.234e+1023 we would use the tuple bigreal : extended; bigreal := 1.234e+1023; ( 'info', &bigreal ); Again, the variable to receive this real number would have to declared as an extended as well. Formals Formals must match the types they are to receive. Because we are programming both the code the worker will receive and the code the developer will execute, we can easily match the types. However, it may happen that in a case where we were using reals and the accuracy was not enough so we change to extended reals, we missed a declaration. The system will not match the tuple because a standard real is 6 bytes and an extended is 10 bytes in length. INP & RDP The pre-existing compiler strikes again. The INP and RDP instruction treated as function normally. They will returned either true or false if a tuple was matched by the master. However, we were unable to get the functionality of these two instructions exactly. In PCS-Linda the RDp and INP instructions must be assigned to a boolean variable such as ok := RDP ( 'info', bigreal ); If a tuple was matched, bigreal will contain the real value sent with the tuple and ok will be set to true. If a null tuple was sent by the master, ok will be set to false, and bigreal will not be changed. If we were to use the RDP ( INP can be substituted wherever RDP is used ) to control a loop we would do the following ok := rdp ( 'info', bigreal ); counter := 1; while not ok do begin inc ( counter ); ok := rdp ( 'info', bigreal ); end; This loop would count the number of times the RDP instruction was used before a successful tuple match was found. In Original Linda from Scientific Computing Associates, Inc. this loop would appear as counter := 1; while not rdp ( 'info', bigreal ) do; inc ( counter ); This is not a big change but it is one that should be kept in mind when using the RDP and INP instructions. EVAL The EVAL instructions is used to send a procedure to be executed by a worker. PCS-Linda only allows one element in the tuple used for the EVAL instruction * The first element must be &procedure name The name of the tuple must be 'work'. The worker system software has been precompiled to look for a tuple of this name with the above elements. Because duplicate elements are allowed in tuple space, the name 'work' will not cause a problem when sending more than a single procedure to be executed by workers. Thus the tuples eval ( 'work', &compute ); eval ( 'work', &result ); will not conflict with each other. Tuple Size The tuple size of the present Parallel Lan System is 16000 bytes. This will allow a large amount of data to be sent between machine per instruction. If this is not acceptable for your situation, you receive garbage when trying to execute your parallel program, ( the workers will probably get confused as your code executes ), call us and we can increase your systems tuple size. What Does OUT (); Produce Lastly, we would like to show what the instruction var start_col : integer; results : array[1..200] of integer; out ( 'col', &start_col, &results ); produces when put through the Linda Conversion Program. begin make_tuple ( 3, 'c', 'o', 'l', ' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ', @start_col, sizeof (start_col), yes, @results, sizeof (results), yes, nil,0,null, nil,0,null, nil,0,null, nil,0,null ); send_tuple ( out ); end; All of the above is required for a single PCS-Linda instruction. It is beyond the scope of this guide to detail what is being done here. For more information about the technical nature of the Parallel Lan System consult the paper - Parallel Lan System - A Course In Overcoming. An Example Let's start using the things we have learned by formulating an example exercise. We want to have multiple processors find the sum of all the integers between 1 and 20. We want to write the program in PCS-Linda and Turbo Pascal. Because of the nature of the PLS, if your system only has a single worker, the code will still execute correctly. We want our developer to put the numbers 1 through 20 into tuple space and wait for a sum to be computed. the first thing we should consider is what our workers are going to do. Our workers are suppose to take a number and add it to a sum. The workers are going to get this number and the sum from tuple space. So let's get these two things in ( 'num', a ); in ( 'sum', sum ); Both a and sum will be declared as integers for this problem. Once the worker has these two things, it will add the number in a to the running sum. sum := sum + a; After which, the worker has to put the new sum back in tuple space. out ( 'sum', &sum ); The worker is finished. Several questions should come to mind. First of all, why did we IN both a and sum, why not use RD instead. The first thing that we want to happen is a worker to get the sum, which should be zero since we are starting the addition sequence. The worker should then get one of the numbers to add to the current sum. The first worker will request the current 'sum' tuple and a 'num' tuple. The worker will add the number to the sum and get a value of 1 for sum. This new sum is placed back in tuple space for the next worker. If we did not remove the 'sum' and 'num' tuple, we would have a tuple space that was filled with multiple 'sum' tuples and duplicate 'num' tuples. When the next worker requests a 'sum' and a 'num' tuple, it may get the 'sum' tuple that has a value of 0 and not the one with the value of 1. The duty of the developer program was to put the work procedure and necessary values in tuple space. The code will look like out ( 'sum', 0 ); for i := 1 to 20 do begin eval ( 'work', &worker ); out ( 'num', &i ); end; The developer will end by INing the ending sum and printing the result. in ( 'sum', &sum ); The program looks like this: program add; {$N+,E+,F+} uses work, both; var i, sum : integer; procedure startup; begin exitsave := exitproc; exitproc := @myexit; master[1] := $--; master[2] := $--; master[3] := $--; master[4] := $--; master[5] := $--; master[6] := $--; init_system; end; procedure worker; var a, sum : integer; begin in ( 'num' , a ); in ( 'sum', sum ); sum := sum + num; out ( 'sum', &sum ); end; begin startup; out ( 'sum', 0 ); for i := 1 to 20 do begin eval ( 'work', &worker ); out ( 'num', &i ); end; in ( 'sum', sum ); writeln ( sum ); close_system; end; Analysis Now look at this code carefully. As soon as the worker receive their procedures to execute, they try to IN two tuples named 'num' and 'sum'. But what does the developer do, it immediately tries to IN the 'sum' tuple. Now we cannot predict who will get the 'sum' tuple first but the developer is in line to get it. We want the developer to IN the 'sum' tuple after all tuples have added the values to sum not while they are currently adding the values. We must put something into the code which will delay the developer from getting the 'sum' tuple until after all workers have added all values to sum. We could put a delay, time wise, into the developer code. Delays are dangerous though. Too many parameters go into the circumstances surrounding delays. We need a tuple that can be increment when a value is added to sum; like a loop control variable. We can do this fairly easily in PCS- Linda. Let's define a tuple called 'count' ( 'count', count ); Before any values are added to sum, count should be set to zero and placed into the tuple space. out ( 'count', 0 ); The worker code must include instructions to read the 'count' tuple and increment it when a value is added to sum. in ('count', count ); inc ( count ); out ( 'count, &count ); The most important part is up to the developer. We do not want to read the final 'sum' tuple until after all values have been added to sum. We have two instructions that can be used; either IN or RD. If we use RD it will leave the 'count' tuple in the tuple space. So let's use IN. But what do we want to IN. We want to read the final count of 20. in ( 'count', 20 ); This IN instructions will send a tuple to be matched to the master. Nothing will happen in the developer until a match is made with the tuple 'count' having a value of 20. Final Code program add; {$N+,E+,F+} uses work, both; var i, sum, count : integer; procedure startup; begin exitsave := exitproc; exitproc := @myexit; master[1] := $--; master[2] := $--; master[3] := $--; master[4] := $--; master[5] := $--; master[6] := $--; init_system; end; procedure worker; var a, sum : integer; begin in ( 'num' , a ); in ( 'sum', sum ); sum := sum + num; out ( 'sum', &sum ); in ( 'count', count ); inc ( count ); out ( 'count' , &count ); end; begin startup; out ( 'sum', 0 ); out ( 'count', 0 ); for i := 1 to 20 do begin eval ( 'work', &worker ); out ( 'num', &i ); end; in ( 'count', 20 ); in ( 'sum', sum ); writeln ( sum ); close_system; end; Mandelbrot Mathematicians are able to generate wonderful things with numbers. One particular thing created from numbers are pictures. The mandelbrot complex number set is a set of numbers that when put through a particular set of equations generates a picture. This manual is not the place for a detailed description of the mandelbrot calculations. Books on fractals will typically have a detailed look at mandelbrot numbers. We will describe as much as needed for the calculations and the parallel program we will write. Graphics Screen In order to plot a picture on a graphics screen, we have to create a relationship between each of the pixels on the screen and a particular complex number. For the mandelbrot program we present, we are going to concentrate on a 320 x 200 portion of the VGA graphics screen available on IBM PCs or compatible. Since we are going to be using complex number is the calculations, we need to define what a complex number is. A complex number has both a real and an imaginary part for each number. There will always be two number for every complex number. Turbo Pascal does not have a complex number type. Using the TYPE feature of Pascal, we can create a complex number type type complex = record realp, imag : extended; end; The nature of the calculations suggests that we use extended real units in order to achieve the best possible accuracy. For the mandelbrot calculations, a corner value is selected which acts as a reference point for the calculations bcorner : complex; This corner represented the upper left hand corner. For the standard mandelbrot picture, bcorner is given the value bcorner.realp := -2.0; bcorner.imag := -1.25; In addition to the corner, we must know the lengths of the sides of the picture both width and height. The value typically used for both the width and height is 2.50. Notice that all of these values can be changed to get different pictures of the mandelbrot set. The last values needed for the calculations are called gap values. The calculations are such that the precision of the numbers can be changed for different screen resolutions. The calculations use two different gap values, one for the width of the graphics screen and one for the height. gap1 : extended; {width} gap2 : extended; {height} Because all of the calculations will use the same gap values, they can be predefined. The gap values are calculated by taking the length of each side ( 2.50 ) and divide it by the number of pixels in each dimension. Therefore we have the values gap1 := 2.50 / 320; gap2 := 2.50 / 200; What these gaps say is there is 2.50/320 space between pixels on the graphics screen being used to display the mandelbrot number set. When higher resolution screens are used, the gap gets smaller and smaller thus enhancing the resolution of the picture. Sequential Program We have the basic values for the mandelbrot picture we wish to procedure. The only thing left is to do the actual calculations. The calculation that must be done for each pixel is z := z2 + bcorner; When the size of z grows to be greater than 4.0, we can stop the calculations. Mandelbrot numbers are characterized by not approaching 4.0. Therefore, these number could be put through the above equation many times. This suggests that we need some sort of variable to control the total number of times we do the calculation. A good control value would be 50. If a value is put through the calculation 50 times and the size is less than 4.0, the number belongs to the mandelbrot set. Values in the mandelbrot set are colored black in our picture. Values that reach 4.0 before 50 are colored different colors. The actual calculation procedure will appear this way function calculate ( col, row, bcorner, ncomplex ) : integer; begin size := 0.0; result := 0; original.realp := ncomplex.realp; original.imag := original.imag; while ( results < 50 ) and ( size < 4.0 ) do begin original.realp := sqr(original.realp) - sqr ( original.imag ) + ncomplex.realp original.imag := 2*original.realp*original.imag + ncomplex.imag; size := sqr ( original.realp ) + sqr ( original.imag ); inc ( result ); end; calculate := result; end; This function is performed for each pixel in the screen. In a 320 x 200 graphic screen there are 64000 calls to this function. In a large graphics screen such as 1024 x 768, there would be 786,432 calls to the function. One parameter in the equation not yet examined is ncomplex. Ncomplex is a complex number which represents the actual pixel on the graphics screen. Ncomplex is calculated by adding the distance the pixel is from the corner of the screen ncomplex.realp := current_column_pixel * gap1 + bcorner.realp; ncomplex.imag := current_row_pixel * gap2 + bcorner.imag; So each pixel starts at the bcorner and adds the number of gaps in each direction the pixel is from the upper left corner, thus multiplication by the gaps. The entire sequential program appears as program mandel; {$N+,E+,F+} (* use 8087 if present or emulate if not *) uses dos, crt , graph; type complex = record realp : extended; imag : extended; end; var bcorner, ncomplex : complex; ccol, crow, row, column, result : integer; gap1, gap2, side1, side2 : extended; procedure get_coordinates; begin clrscr; write ( 'Enter the real part of the lower left corner ( -2.0 ) :'); readln ( bcorner.realp ); write ( 'Enter the imaginary part of the lower left corner ( -1.25 ) :'); readln ( bcorner.imag ); write ( 'Enter the length of real edge ( 2.50 ) :'); readln ( side1 ); write ( 'Enter the length of imaginary edge ( 2.50 ) :'); readln ( side2 ); write ( 'Enter the pixels length of a row ( 320 ) :'); readln ( row ); write ( 'Enter the pixels length of a column ( 200 ) :'); readln ( column ); end; procedure compute_stuff; begin gap1 := side1 / row; gap2 := side2 / column; end; procedure prepare_screen; var graphdriver, graphmode : integer; begin graphdriver := vga; graphmode := vgahi; initgraph ( graphdriver , graphmode , 'c:\tp' ); end; procedure plot ( crow , ccol , result : integer ); var color : word; begin color := 1; if result < 2 then color := 1; if result > 2 then color := 9; if result > 4 then color := 2; if result > 6 then color := 10; if result > 8 then color := 4; if result > 10 then color := 12; if result > 12 then color := 5; if result > 14 then color := 13; if result > 16 then color := 8; if result > 18 then color := 7; if result > 20 then color := 0; putpixel ( ccol , crow , color ); end; procedure get_complex; begin ncomplex.realp := ccol * gap1 + bcorner.realp; ncomplex.imag := crow * gap2 + bcorner.imag; end; procedure calculate_mandel; var original : complex; size : extended; begin result := 0; original.realp := 0.0; original.imag := 0.0; while ( result <= 21 ) and ( size < 4.0 ) do begin original.realp := original.realp*original.realp - orginal.imag*original.imag + ncomplex.realp; original.imag := 2 * ( original.realp * original.imag ) + ncomplex.imag; size := original.realp * original.realp + original.imag * original.imag; inc ( result ); end; end; begin get_coordinates; prepare_screen; compute_stuff; for ccol := 1 to row do for crow := 1 to column do begin get_complex ( crow , ccol , gap1 , ncomplex , bcorner ); calculate_mandel ( ncomplex , result ); plot ( crow , ccol , result ); end; end. This program can be compiled using Turbo Pascal 5.5 or 6.0 and executed to demonstrate the picture that will be produced. The row and column entries can be increased to 640 by 480 if desired. Parallel Programming Methods The mandelbrot program represents a large number of tasks which perform the same operations on a large set of data. Each and every pixel in the graphics screen has to be calculation on using the calculate function presented above. There is no way around it. Thus if we have two processor available for work, we can be doing two pixels at an given moment instead of just one. Just think, if we had 64000 processors, all pixels would be calculated at the same time. Think about the speed up. Our job is to write a parallel program using PCS-Linda and Turbo Pascal that will perform the mandelbrot program using any given number of processors. Developer The easiest way to approach this problem is to first consider the job or duties of the developer. After we have determined the job of the developer, we can look at the code the worker will execute. Recall from above that the calculations for each pixel all rely on a set of simple calculations which were performed only once. These include number of pixels in each column number of pixels in each row length of column length of row gap value for column gap value for row corner value column to compute number of columns to compute In addition, the graphics screen of the system must also be initialized. Columns We must ask a question about how to distribute the work to the workers. The sequential program computed pixels one at a time. Each pixel is then plotted on the screen. We could allow each worker to only compute one pixel at a time. Because of the nature of the Parallel Lan System, there would be some communication overhead for each pixel. In reality it might take longer for a number of processors to compute the screen than it would for a single processor if we only allow each worker to compute one pixel. A solution to this problem may be to allow each worker to compute an entire column of pixels. This would significantly cut down on the communication because there would only be one exchange for every 200 pixels if our example problem. This is a much better solution. By allowing each worker to computer a column of pixels, we can introduce another value that will tell each worker how many columns to compute. We might run tests to determine if each worker should have one column, two, four, or more between communication calls. Results The developer is responsible for setting up the calculations as well as receiving the results. As each worker finishes its column or columns, it will have to put the results into the tuple space for the developer to access. Once the developer has a set of results, it can pass to them to a procedure to be plotted. After 320 result packets have been received, the developer can perform any housecleaning necessary and quit execution. The code for the developer looks like this procedure plot ( col : integer; results : resu ); var i,j : integer; color : word; begin for j := 0 to num_col-1 do for i := 1 to 200 do begin if results[i+j*200] < 20 then color := 1; if results[i+j*200] > 20 then color := 9; if results[i+j*200] > 40 then color := 2; if results[i+j*200] > 60 then color := 10; if results[i+j*200] > 80 then color := 4; if results[i+j*200] > 100 then color := 12; if results[i+j*200] > 120 then color := 5; if results[i+j*200] > 140 then color := 13; if results[i+j*200] > 160 then color := 8; if results[i+j*200] > 180 then color := 7; if results[i+j*200] > 200 then color := 0; putpixel ( col+j , i, color ); end; end; begin start_up; graphdriver := vga; graphmode := vgahi; initgraph ( graphdriver, graphmode, 'a:' ); gap1 := 2.50 / 320; gap2 := 2.50 / 200; bcorner.realp := -2.0; bcorner.imag := -1.25; cur_col := 1; tot_col := 320; pix_col := 200; num_col := 1; for i := 1 to num_proc do begin eval ( 'work', &adder ); out ( 'stuff', &gap1, &gap2, &bcorner ); end; out ( 'screen', &cur_col, &tot_col, &pix_col, &num_col ); for i := 1 to tot_col div num_col do begin in ( 'col', start_col, results ); plot ( start_col, results ); end; readln; { system_shutdown } close_system end. Analysis The developer starts by calling the startup procedure for initialization of the network. After the network is setup, the screen must be changed to graphics mode. graphdriver := vga; graphmode := vgahi; initgraph ( graphdriver, graphmode, 'a:' ); These Turbo Pascal commands instruct the graphic card in the PC to switch to VGA mode. The Turbo Pascal BGI file for VGA must be on the drive specified in the initgraph statement. The developer proceeds to establish the constants for the program. Gap1, gap2, and bcorner are the same as in the sequential program. gap1 := 2.50 / 320; gap2 := 2.50 / 200; bcorner.realp := -2.0; bcorner.imag := -1.25; The next four statements set up the working environment for the workers. cur_col := 1; tot_col := 320; pix_col := 200; num_col := 1; Cur_col is the number of the next column that needs to be computed. Tot_col is the total number of columns that are to be computed. Pix_col is the total number of pixels in each column. Num_col is the step value or the number of columns each worker is suppose to compute. These values will all be shared by the different workers in the system. Once all of the values are computed, the developer is ready to put the work and initialization values in tuple space for the workers to pick up. for i := 1 to num_proc do begin eval ( 'work', &adder ); out ( 'stuff', &gap1, &gap2, &bcorner ); end; Num_proc is the total number of processor on the system or the number of processors to be utilized in the calculations. The common values for the workers is the next tuple put into tuple space. out ( 'screen', &cur_col, &tot_col, &pix_col, &num_col ); The developer is now free to concentrate on its own activities mainly receiving and plotting the results. for i := 1 to tot_col DIV num_col do begin in ( 'col', start_col, results ); plot ( start_col, results ); end; Num_col is used to divide the total number of 'col' tuples to receive based on the number of columns each worker will calculate. The last thing the developer needs to do is clean up the tuple space and shutdown the system in ( 'screen', cur_col, tot_col, pix_col, num_col ); close_system. Worker Next we must determine what the duty of the worker is. The worker must IN the common data from tuple space. This data would include the gaps and corner value. The worker would then IN the screen information. It is this information that will determine whether or not a column needs to be computed. If the worker IN the tuple and the cur_col to compute is greater than the tot_col value, then the system has successfully computed all columns. The worker ca quit executing this procedure. If the cur_col is not greater than the tot_col value, it will have to determine how many columns to compute and compute them. The code for the worker looks like this procedure adder; type complex = record realp, imag : double; end; var plen : integer; pkt : pointer; a_tuple : tuple_pointer; gap1,gap2, a,b,c, size : double; bcorner, ncomplex, original : complex; cur_col, tot_col, pix_col, num_col, cc, row, r, indexc, result, start_col, end_col : integer; results : array[1..200*num_col] of integer; finished : boolean; begin in ( 'stuff', gap1, gap2, bcorner ); in ( 'screen', cur_col, tot_col, pix_col, num_col ); if cur_col > tot_col then begin finished := true; end else begin finished := false; start_col := cur_col; end_col := cur_col + num_col - 1; cur_col := cur_col + num_col; end; out ( 'screen', &cur_col, &tot_col, &pix_col, &num_col ); while not finished do begin indexc := 0; for cc := start_col to end_col do begin for r := 1 to pix_col do begin ncomplex.realp := cc * gap1 + bcorner.realp; ncomplex.imag := r * gap2 + bcorner.imag; result := 0; size := 0.0; original.realp := 0.0; original.imag := 0.0; while ( result <= 21 ) and ( size < 4.0 ) do begin a := original.realp * original.realp; b := original.realp * original.imag; c := original.imag * original.imag; original.realp := a-c+ncomplex.realp; original.imag := b+b+ncomplex.imag; size := original.realp*original.realp+original.imag*original.imag; inc ( result ); end; results[indexc+r] := result; end; indexc := indexc+pix_col; end; out ( 'col', &start_col, &results ); in ( 'screen', cur_col, tot_col, pix_col, num_col ); if cur_col > tot_col then begin finished := true; end else begin start_col := cur_col; end_col := cur_col + num_col-1; cur_col := cur_col + num_col; end; out ( 'screen', &cur_col, &tot_col, &pix_col, &num_col ); end; end; The worker begins by INing the common information in ( 'stuff', gap1, gap2, bcorner ); followed by the screen information in ( 'screen', cur_col, tot_col, pix_col, num_col ) and determines if there is anything to compute if cur_col > tot_col then if there isn't anything to do, the boolean variable finished will be set to true and the compute loop will not be entered. If there is work to do, the worker will obtain its starting columns and put it into the variable start_col. The ending columns will be put into the variable end_col. The cur_col will be increased the number of columns the worker is going to compute. Once all of this has bee determined, the screen tuple is put back into tuple space. begin finished := true; end else begin finished := false; start_col := cur_col; end_col := cur_col + num+col - 1; cur_col := cur_col + num+col; end; out ( 'screen', &cur_col, &tot_col, &pix_col, &num_col ); Notice that in either case of the if statement, the screen tuple is put back into the tuple space. If this were not done, the other workers in the system would block because they could not find the screen tuple. Therefore, it must be put back into tuple space. In order to simplify the way results are sent back to the developer, a single array was used that includes enough space for the total number of pixels a worker computes. This includes the situation where multiple columns are computed. The indexc variable does the job of coordinating where the results will go in the array. The first column of pixels will use the locations 1-200 while the next columns uses the location 201-400, etc. Indexc is incremented by pix_col for every column computed. Once all columns have been computed for this particular group of columns, they are put into tuple space with the out ( 'col', &start_col, &results ); command. The worker will then IN another screen tuple and begin again. Complete Code The complete code for the system is program devman; {$N+,E+,F+} uses dos, crt, both, work, graph; const num_colu = 2; num_proc = 1; type complex = record realp, imag : double; end; resu = array[1..200*num_colu] of integer; var gap1, gap2 : double; bcorner : complex; i, j, cur_col, pix_col, start_col, num_col, tot_col : integer; results : resu; graphdriver, graphmode : integer; procedure start_up; begin exitsave := exitproc; exitproc := @myexit; master[1] := $00; master[2] := $00; master[3] := $C0; master[4] := $05; master[5] := $86; master[6] := $24; init_system; end; procedure adder; type complex = record realp, imag : double; end; var gap1, gap2, a,b,c, size : double; bcorner, ncomplex, original : complex; cur_col, tot_col, num_col, pix_col, cc, row, r, indexc, result, start_col, end_col : integer; results : array[1..200*num_colu] of integer; finished : boolean; begin in ( 'stuff', gap1, gap2, bcorner ); in ( 'screen', cur_col, tot_col, pix_col, num_col ); if cur_col > tot_col then begin finished := true; end else begin finished := false; start_col := cur_col; end_col := cur_col + num_col - 1; cur_col := cur_col + num_col; end; out ( 'screen', &cur_col, &tot_col, &pix_col, &num_col ); while not finished do begin indexc := 0; for cc := start_col to end_col do begin for r := 1 to pix_col do begin ncomplex.realp := cc * gap1 + bcorner.realp; ncomplex.imag := r * gap2 + bcorner.imag; result := 0; size := 0.0; original.realp := 0.0; original.imag := 0.0; while ( result <= 21 ) and ( size < 4.0 ) do begin a := original.realp * original.realp; b := original.realp * original.imag; c := original.imag * original.imag; original.realp := a-c+ncomplex.realp; original.imag := b+b+ncomplex.imag; size := original.realp*original.realp+original.imag*original.imag; inc ( result ); end; results[indexc+r] := result; end; indexc := indexc+pix_col; end; out ( 'col', &start_col, &results ); in ( 'screen', cur_col, tot_col, pix_col, num_col ); if cur_col > tot_col then begin finished := true; end else begin start_col := cur_col; end_col := cur_col + num_col-1; cur_col := cur_col + num_col; end; out ( 'screen', &cur_col, &tot_col, &pix_col, &num_col ); end; end; procedure plot ( col : integer; results : resu ); var i,j : integer; color : word; begin for j := 0 to num_col-1 do for i := 1 to 200 do begin if results[i+j*200] < 20 then color := 1; if results[i+j*200] > 20 then color := 9; if results[i+j*200] > 40 then color := 2; if results[i+j*200] > 60 then color := 10; if results[i+j*200] > 80 then color := 4; if results[i+j*200] > 100 then color := 12; if results[i+j*200] > 120 then color := 5; if results[i+j*200] > 140 then color := 13; if results[i+j*200] > 160 then color := 8; if results[i+j*200] > 180 then color := 7; if results[i+j*200] > 200 then color := 0; putpixel ( col+j , i, color ); end; end; begin start_up; graphdriver := vga; graphmode := vgahi; initgraph ( graphdriver, graphmode, 'a:' ); gap1 := 2.50 / 320; gap2 := 2.50 / 200; bcorner.realp := -2.0; bcorner.imag := -1.25; cur_col := 1; tot_col := 320; pix_col := 200; num_col := num_colu; for i := 1 to num_proc do begin eval ( 'work', &adder ); out ( 'stuff', &gap1, &gap2, &bcorner ); end; out ( 'screen', &cur_col, &tot_col, &pix_col, &num_col ); for i := 1 to tot_col div num_col do begin in ( 'col', start_col, results ); plot ( start_col, results ); end; in ( 'screen', cur_col, tot_col, pix_col, num_col ); readln; { system_shutdown } close_system end. This code is contained on the samples disk provided with the system. Convert it to standard Turbo Pascal, compile it, and run it for an interesting result. In order to understand what the code is doing exactly, follow the code for a single developer and a single worker. Notice the changes that take place in the screen tuple. This screen tuple is the main controller of the entire program. Once you have the code operational, try changing some of the initial values to obtain different pictures. The color codes ( numbers ) used in the plotting routine can also be change to different sequences. These number were taken directly from those in the Turbo Pascal Reference Manual.